Analysis of Seattle Fremont Bridge Bike Traffic


In [28]:
%matplotlib inline

import matplotlib.pyplot as plt;
from jubiiworkflow.data import get_data

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.mixture import GaussianMixture

plt.style.use('seaborn');

Get Data


In [66]:
data = get_data()
p = data.resample('W').sum().plot()
p.set_ylim(0, None);


Shows a graph of data on a weekly basis. Let's investigate what the pattern is when we look at hourly rates on individual days...

Pivot

Plot a hourly traffic rates for all days in data.


In [67]:
pivoted = data.pivot_table('Total', index=data.index.time, columns=data.index.date)
pivoted.plot(legend=False, alpha=0.01);


We can see two types of lines in this graph... One type with two peaks, and another type that have a peak in the middle of the day. We hypothesize that this is a difference between weekdays and weekends. Let's investigate further.

Principal Component Analysis


In [17]:
X = pivoted.fillna(0).T.values
X.shape


Out[17]:
(1671, 24)

In [68]:
X2 = PCA(2, svd_solver='full').fit_transform(X)
X2.shape


Out[68]:
(1671, 2)

In [69]:
plt.scatter(X2[:, 0], X2[:, 1]);


Unsupervised Clustering


In [59]:
gmm = GaussianMixture(2).fit(X)
labels = gmm.predict(X)

In [60]:
plt.scatter(X2[:, 0], X2[:, 1], c=labels, cmap='rainbow')
plt.colorbar();



In [61]:
fig, ax = plt.subplots(1, 2, figsize=(14, 6))

pivoted.T[labels == 0].T.plot(legend=False, alpha=0.1, ax=ax[0]);
pivoted.T[labels == 1].T.plot(legend=False, alpha=0.1, ax=ax[1]);

ax[0].set_title('Purple Cluster')
ax[1].set_title('Red Cluster');


Comparing with Day of Week


In [26]:
dayofweek = pd.DatetimeIndex(pivoted.columns).dayofweek

plt.scatter(X2[:, 0], X2[:, 1], c=dayofweek, cmap='rainbow')
plt.colorbar();


Analyzing Outliers

The following points are weekdays in the "weekend" cluster


In [27]:
dates = pd.DatetimeIndex(pivoted.columns)
dates[(labels == 0) & (dayofweek < 5)]


Out[27]:
DatetimeIndex(['2012-11-22', '2012-11-23', '2012-12-24', '2012-12-25',
               '2013-01-01', '2013-05-27', '2013-07-04', '2013-07-05',
               '2013-09-02', '2013-11-28', '2013-11-29', '2013-12-20',
               '2013-12-24', '2013-12-25', '2014-01-01', '2014-04-23',
               '2014-05-26', '2014-07-04', '2014-09-01', '2014-11-27',
               '2014-11-28', '2014-12-24', '2014-12-25', '2014-12-26',
               '2015-01-01', '2015-05-25', '2015-07-03', '2015-09-07',
               '2015-11-26', '2015-11-27', '2015-12-24', '2015-12-25',
               '2016-01-01', '2016-05-30', '2016-07-04', '2016-09-05',
               '2016-11-24', '2016-11-25', '2016-12-26', '2017-01-02',
               '2017-02-06'],
              dtype='datetime64[ns]', freq=None)

Conclusion

Looks like all weekdays in the "weekend" cluster are corresponding to normal us holidays christmas, 4th of july, thanksgiving. Feb 6, 2017 is due to extreme weather (blizzard) on that day, which caused major shut-downs across city.